24 research outputs found

    Automatic Summarization for Student Reflective Responses

    Get PDF
    Educational research has demonstrated that asking students to respond to reflection prompts can improve both teaching and learning. However, summarizing student responses to these prompts is an onerous task for humans and poses challenges for existing summarization methods. From the input perspective, there are three challenges. First, there is a lexical variety problem due to the fact that different students tend to use different expressions. Second, there is a length variety problem that student inputs range from single words to multiple sentences. Third, there is a redundancy issue since some content among student responses are not useful. From the output perspective, there are two additional challenges. First, the human summaries consist of a list of important phrases instead of sentences. Second, from an instructor's perspective, the number of students who have a particular problem or are interested in a particular topic is valuable. The goal of this research is to enhance student response summarization at multiple levels of granularity. At the sentence level, we propose a novel summarization algorithm by extending traditional ILP-based framework with a low-rank matrix approximation to address the challenge of lexical variety. At the phrase level, we propose a phrase summarization framework by a combination of phrase extraction, phrase clustering, and phrase ranking. Experimental results show the effectiveness on multiple student response data sets. Also at the phrase level, we propose a quantitative phrase summarization algorithm in order to estimate the number of students who semantically mention the phrases in a summary. We first introduce a new phrase-based highlighting scheme for automatic summarization. It highlights the phrases in the human summaries and also the corresponding semantically-equivalent phrases in student responses. Enabled by the highlighting scheme, we improve the previous phrase-based summarization framework by developing a supervised candidate phrase extraction, learning to estimate the phrase similarities, and experimenting with different clustering algorithms to group phrases into clusters. Experimental results show that our proposed methods not only yield better summarization performance evaluated using ROUGE, but also produce summaries that capture the pressing student needs

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Neural Chinese Word Segmentation with Lexicon and Unlabeled Data via Posterior Regularization

    Full text link
    Existing methods for CWS usually rely on a large number of labeled sentences to train word segmentation models, which are expensive and time-consuming to annotate. Luckily, the unlabeled data is usually easy to collect and many high-quality Chinese lexicons are off-the-shelf, both of which can provide useful information for CWS. In this paper, we propose a neural approach for Chinese word segmentation which can exploit both lexicon and unlabeled data. Our approach is based on a variant of posterior regularization algorithm, and the unlabeled data and lexicon are incorporated into model training as indirect supervision by regularizing the prediction space of CWS models. Extensive experiments on multiple benchmark datasets in both in-domain and cross-domain scenarios validate the effectiveness of our approach.Comment: 7 pages, 11 figures, accepted by the 2019 World Wide Web Conference (WWW '19

    An Improved Phrase-Based Approach To Annotating And Summarizing Student Course Responses

    No full text
    Teaching large classes remains a great challenge, primarily because it is difficult to attend to all the student needs in a timely manner. Automatic text summarization systems can be leveraged to summarize the student feedback, submitted immediately after each lecture, but it is left to be discovered what makes a good summary for student responses. In this work we explore a new methodology that effectively extracts summary phrases from the student responses. Each phrase is tagged with the number of students who raise the issue. The phrases are evaluated along two dimensions: with respect to text content, they should be informative and well-formed, measured by the ROUGE metric; additionally, they shall attend to the most pressing student needs, measured by a newly proposed metric. This work is enabled by a phrase-based annotation and highlighting scheme, which is new to the summarization task. The phrase-based framework allows us to summarize the student responses into a set of bullet points and present to the instructor promptly

    A Novel Ilp Framework For Summarizing Content With High Lexical Variety

    No full text
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system\u27s ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word cooccurrence matrix to intrinsically group semantically similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety
    corecore